No, it's german for "The Bootstrap, the" #1958

moskyb · 2023-02-16T11:39:47Z

The core of the buildkite agent (one of its cores, anyway) is a component currently called "The Bootstrap". This is the part of the agent that's actually responsible for running jobs, streaming their logs back to the buildkite mothership, and doing all the business of running hooks, finding plugins, doing git things, etc.

Were it only that simple.

What we call "the bootstrap" is actually three separate components from this repo's point of view:

A CLI command, buildkite-agent bootstrap, which is what the agent calls when it gets a new job to run
A go package called bootstrap that contains most of the code that gets run to run a job
A go struct, bootstrap.Bootstrap which holds the logic for job execution (though there are other peripheral job execution-related bits and bobs hanging around in the bootstrap package mentioned above)

These three things being named the same thing makes talking about them separately a pain; when talking about "the bootstrap", there's a variety of things that could be the subject of discussion.

Furthermore...

"Bootstrap" is a kind of a crappy name for what this thing does

There was a time, long ago, when this name probably fit. Fun fact, prior to v3 of the agent, the bootstrap used to be a bash script that the agent ran. At this point, the bootstrap was mostly responsible for standing up (bootstrapping, one might say) an environment in which a job (at the time a bash script and nothing more).

Times have changed however, and the bootstrap is now a (very) complex piece of go code responsible for orchestrating all of the various tasks that need to happen before, during and after a job run.

Okay, but why change it?

Simply put, the name is confusing and it means that when we talk about the bootstrap (which we usually mean as "the job execution thingy") to our colleagues and to our customers, there's context that's lost in translation.

The bootstrap is an incredibly important part - maybe the most important part - of a job's execution lifecycle, and we fairly regularly have need to talk to customers about it. Knowing what the bootstrap actually is requires knowledge of the agent's history though, and it makes talking about these things, and intuiting how the agent actually works, a lot harder.

Consider: If you were a buildkite customer and a bikkie said "oh that's a bootstrap error", what would you think the problem is? How about if they said (foreshadowing) "I think there's an error in the job executor"?

Cool. What have you done about it?

This PR is basically a big fancy find-and-replace. The gist of it is:

The buildkite-agent bootstrap command is deprecated (but not removed) and replaced with buildkite-agent run-job. This new command is functionally identical to the existing one, with the only change being that it doesn't have a deprecation notice
The bootstrap package has been renamed to job. This makes a lot of names clearer IMO - consider bootstrap.Shell vs job.Shell
The boostrap.Bootstrap struct has been renamed to job.Executor. This is more in line with what it actually does - it executes a job

None of these names are final - i'd love some feedback on them. Two hard things and all that.

Open Questions

Is job.Executor too similar semantically to agent.JobRunner? My opinion is no, but it's not particularly strongly held
Should we bother scrubbing all mention of the bootstrap from the repo or is it okay to leave some of them in there?

Still to do

Update agent/job_runner.go to:
- Use the new nomenclature
- Add a hook called pre-exec, identical to pre-bootstrap but with the shiny new name
- Add a deprecation warning to the pre-bootstrap hook??? should we just continue to allow it?
Another round of seek-and-destroy on instances of the text bootstrap. They're pervasive!
Local smoke testing to ensure that:
- The agent uses buildkite-agent exec-job as its job executor by default
- buildkite-agent bootstrap still works okay, but outputs a deprecation warning
- The agent's bootstrap can be overridden using both --bootstrap-script and --job-executor-script.

pda · 2023-02-24T00:05:57Z

Naming bike-shedding: I wonder if run-job would match our other terminology closer than exec-job.
e.g. Job state will be running as a result of this not-bootstrap thing happening.
And when you look at it in Test Analytics, it'll be called a Run (I think).
Also, “exec” feels quite low-level syscall-ish, whereas the not-bootstrap does quite a lot of higher-level coordination before executing one-or-more processes/hooks/plugins/containers/things.

Apologies, I haven't looked/thought deeper about the PR more broadly, I only have this bike-shed right now 😅

pda · 2023-02-24T00:06:33Z

Also: I was totally baited into looking at this by the excellent PR title 🤡

moskyb · 2023-02-26T21:56:30Z

I wonder if run-job would match our other terminology closer than exec-job

@pda i think i agree with you here - it's terser while also holding more information. how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)

Also: I was totally baited into looking at this by the excellent PR title 🤡

my cunning plan has worked then

pda · 2023-03-02T23:26:35Z

how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)

Interesting question.

I'm a proponent of ubiquitous language; it'd be a shame to have two names for one thing.

One arguable argument against “runner” is that other platforms call their entire agent a “runner” (GitHub Actions, GitLab), and a subset of our customers will confuse it with that.

The other that you touched on is that we already have a component called JobRunner which lives in the agent outside the ~~bootstrap~~ executor/runner/thing.

I don't have the answers 🤷‍♂️

I wonder…

buildkite-agent start (some people think of this as the “Buildkite self-hosted runner”)
- loop: get jobs (specifically: Command Step jobs, aka Command Jobs)
  - internal JobRunner prepares & orchestrates running the Command Job
    - buildkite-agent bootstrap (rename to run-job?) subprocess
      - do the lifecycle of the Command Job; command, plugins, hooks etc

Maybe JobRunner becomes JobOrchestrator and boostrap.Bootstrap becomes job.Runner? I don't love it.

Taking a step back from specifics…

the main process gets a job and wants to run it, but doesn't know how or isn't capable of doing so directly; it delegates to another layer in a subprocess to actually run the job.
that subprocess exists to run jobs, and knows how to run jobs.

Through that lens, the subprocess has a much stronger claim to “run job” or “job runner” naming, and the main process should find a different name that means ”knows that a job needs running and knows how to ask a subprocess to run the job”.

pda · 2023-03-02T23:38:10Z

Possible alternative names for agent.JobRunner (i.e. the bit that doesn't actually execute the job, it just kicks it off elsewhere)

agent.JobManager
~~agent.JobForker~~
agent.JobOrchestrator
agent.JobStarter
agent.JobSupervisor
agent.JobInvoker

None of those feel great. What does it actually do?

Starts the subprocess to run the job
- Collates the correct env to pass to that process
Streams stdout / stderr / header times to the API
Experimentally knows how to run jobs in k8s/etc instead of as a subprocess

The “k8s/etc” bit means it's not a JobForker.

I'd call it JobDispatcher except that means something different server-side, and the log streaming etc goes a bit beyond just “dispatching”.

agent.JobManager is okayish, to the extent that “manager” is ever a good name for a software component 😬

Maybe it's a JobInvoker but that's just adding yet another synonym for “run” / “execute”.

The fact that it's learning run jobs in different ways (subprocess / k8s / …) feels important here. Again, “dispatcher” kind of suits that. So does “strategy”.

moskyb · 2023-03-03T01:43:38Z

@pda very interesting thoughts 🤔 i agree with you that there remains some confusion about the role of the agent.JobRunner vs job.Executor, but how would you feel about making that change at a later date? my take is that the current setup makes things clearer, though maybe not as clear as they possibly could be, but it's a step in the right direction.

the good thing is that those names (job.Executor and agent.JobRunner) are both completely internal, and can be pretty easily changed

triarius

Really pumped for this to happen!

For cleaning up all references to bootstrap, it's probably fine to do this when we delete the bootstrap command. We should also change the bootstrap-script config key then. For now, we have to keep this config key.

As for the name of the cli command, I prefer subject-object-verb order to subject-verb-object order (despite being an English speaker and vim user). See https://cosine.blue/2019-09-06-kakoune.html, https://simblob.blogspot.com/2019/10/verb-noun-vs-noun-verb.html

So I prefer

buildkite-agent job run

This has the advantage that if we want to add other acions you can perform on a job, we can nest them under the same job subcommand namespace.

SOV order is also consistent with what we have done for the OIDC. There, the command is buildkite-agent oidc request-token.

DrJosh9000

Firstly, bravo for doing this 👏 it's grungy and fiddly work, so you earn A Kudos from me for taking it on. 🎆

I'm pumped for this get this landed! Sorry it's taken me a while to review it, I wanted to give it the review it deserves.

Code looks pretty good! Unfortunately I don't have any particular opinion on the naming.

DrJosh9000 · 2023-03-09T04:04:30Z

agent/job_runner.go

+	hookExit := r.preExecHook(ctx, "pre-bootstrap")
+	hookExit = r.preExecHook(ctx, "pre-exec")


Is the change in behaviour intended here? The hookExit from pre-bootstrap is overwritten by the pre-exec hookExit. So pre-bootstrap would no longer be able to reject the job.

oops, it totally wasn't! fixed now in 088fa3a

agent/job_runner.go

job/docker.go

DrJosh9000 · 2023-03-09T04:12:51Z

job/integration/git.go

@@ -132,8 +132,9 @@ func (gr *gitRepository) Close() error {
 func (gr *gitRepository) Execute(args ...string) (string, error) {
 	path, err := exec.LookPath("git")
 	if err != nil {
-		return "", err
+		return "", fmt.Errorf("finding git executable on path: %w", err)


`exec-job` does the exact same thing as bootstrap, it just has a much better name

…nvars

lox · 2023-06-01T07:18:41Z

I quite like agent.JobSupervisor for the current agent.JobRunner. Prior art from supervisord.

I'd always imagine that we'd add "Executors" which where strategies for executing the bootstrap, what we do now is a LocalShellExecutor or similar. We've built a DockerExecutor at CashApp, I've built an AmazonECSExecutor in the past.

Finding the right name for the bootstrap is a real challenge. The architecture we've built at CashApp where we run the buildkite-agent bootstrap in a docker container (the logical extension of https://github.com/buildkite/docker-bootstrap-example) has really exposed the confusing-ness of the name. The bootstrap is almost not even part of the agent anymore, it could even be running on a totally different host depending on the executor.

What if you actually decoupled it from the buildkite-agent binary? What if it was a buildkite-agent-job-runtime? That also plays into the bk cli and wanting to run a job locally (which actually doesn't need an agent).

The other aspect here of what the bootstrap does is it manages phases (I wish I'd called this stages), hooks and plugins. I've frequently wanted more granular access to these things, for instance being able to call buildkite-agent bootstrap default-checkout-phase directly.

If I was pushed to pick a name for a straight sub-command rename, I'd actually aim to extract it out of the job subcommand to leave room for job commands that operate on the active job (the bootstrap does not in the same way that the other commands do). What about buildkite-agent job-runtime or buildkite-agent job-kernel execute? 😅

moskyb requested review from DrJosh9000 and triarius February 16, 2023 11:40

moskyb force-pushed the s-bootstrap-executor-g branch from 69f0510 to 49d5d3f Compare February 17, 2023 05:17

moskyb force-pushed the s-bootstrap-executor-g branch from e0cd7ca to 351a916 Compare March 2, 2023 06:51

moskyb marked this pull request as ready for review March 2, 2023 06:52

moskyb force-pushed the s-bootstrap-executor-g branch from 351a916 to 303f0d6 Compare March 2, 2023 21:06

triarius reviewed Mar 7, 2023

View reviewed changes

DrJosh9000 reviewed Mar 9, 2023

View reviewed changes

moskyb force-pushed the s-bootstrap-executor-g branch 3 times, most recently from d43a835 to 93e996a Compare March 15, 2023 02:27

moskyb added 12 commits May 16, 2023 10:09

Rename package bootstrap -> job

9e0323c

Rename job.Bootstrap -> job.Executor

ddbeb21

Add new command buildkite-agent exec-job, and deprecate bootstrap

a430283

`exec-job` does the exact same thing as bootstrap, it just has a much better name

Add pre-exec hook identical to pre-bootstrap

f7aebb2

Deprecate existing bootstrap command

2cbbfca

Rename exec-job -> run-job

15ea238

Update buildkite-agent bootstrap deprecation message with more detail

bf40a34

Fix issues with renamed envar, don't rename it, just allow multiple e…

46eff4c

…nvars

Tab-align help text in readme

c7186b0

Rename command run-job to job run

8913d7f

Fix pre-exec/pre-bootstrap hooks

d8e2ee8

Fix broken link

421f4fc

moskyb force-pushed the s-bootstrap-executor-g branch from 93e996a to 421f4fc Compare May 16, 2023 00:18

moskyb mentioned this pull request Jul 5, 2023

Die, Bootstrap, die: Rename package bootstrap -> job #2187

Merged

DrJosh9000 added the cleanup Cleaning up code, refactoring, etc label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No, it's german for "The Bootstrap, the" #1958

No, it's german for "The Bootstrap, the" #1958

moskyb commented Feb 16, 2023 •

edited

Loading

pda commented Feb 24, 2023

pda commented Feb 24, 2023

moskyb commented Feb 26, 2023 •

edited

Loading

pda commented Mar 2, 2023

pda commented Mar 2, 2023

moskyb commented Mar 3, 2023

triarius left a comment •

edited

Loading

DrJosh9000 left a comment

DrJosh9000 Mar 9, 2023

moskyb Mar 10, 2023

DrJosh9000 Mar 9, 2023

lox commented Jun 1, 2023

		hookExit := r.preExecHook(ctx, "pre-bootstrap")
		hookExit = r.preExecHook(ctx, "pre-exec")

No, it's german for "The Bootstrap, the" #1958

Are you sure you want to change the base?

No, it's german for "The Bootstrap, the" #1958

Conversation

moskyb commented Feb 16, 2023 • edited Loading

"Bootstrap" is a kind of a crappy name for what this thing does

Okay, but why change it?

Cool. What have you done about it?

Open Questions

Still to do

pda commented Feb 24, 2023

pda commented Feb 24, 2023

moskyb commented Feb 26, 2023 • edited Loading

pda commented Mar 2, 2023

pda commented Mar 2, 2023

moskyb commented Mar 3, 2023

triarius left a comment • edited Loading

Choose a reason for hiding this comment

DrJosh9000 left a comment

Choose a reason for hiding this comment

DrJosh9000 Mar 9, 2023

Choose a reason for hiding this comment

moskyb Mar 10, 2023

Choose a reason for hiding this comment

DrJosh9000 Mar 9, 2023

Choose a reason for hiding this comment

lox commented Jun 1, 2023

moskyb commented Feb 16, 2023 •

edited

Loading

moskyb commented Feb 26, 2023 •

edited

Loading

triarius left a comment •

edited

Loading